GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS
نویسندگان
چکیده
Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. Even if present platforms produce high quality sequencing data, false positives variants remain an issue and can confound subsequent analysis and result interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which uses deep learning algorithm to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS consists of 4 distinct models tested on NA12878 gold-standard exome variants dataset (NIST v.3.3.2): Illumina INDELs, Illumina SNPs, ION INDELs, and ION SNPs. AUC values for each variant category are 0.9267, 0.7998, 0.9464, and 0.9757, respectively. GARFIELD-NGS is robust on low coverage data down to 30X and on Illumina two-colour data, as well. Our tool outperformed previous hard-filters, and calculates for each variant a score from 0.0 to 1.0, allowing application of different thresholds based on desired level of sensitivity and specificity. GARFIELD-NGS processes standard VCF file input using Perl and Java scripts and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline. GARFIELDNGS is freely available at https://github.com/gedoardo83/GARFIELD-NGS. 2 . CC-BY-NC 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/149146 doi: bioRxiv preprint first posted online Jun. 14, 2017;
منابع مشابه
Implementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss
Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...
متن کاملImplementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss
Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...
متن کاملSignature of Chromosomes Instability in Different Diseases as Accessed on Illumina Miseq Platform using Depth of Coverage Metrics for Variant Evaluation by GATK
Next-generation sequencing (NGS) has been widely applied to clinical diagnosis. Target-gene capture followed by deep sequencing provides unbiased enrichment of the target sequences, which not only accurately detects single-nucleotide variations (SNVs) and small insertion/deletions (indels) but also provides the opportunity for the identification of exonic copy-number variants (CNVs) and large g...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملAn Empirical Bayes Testing Procedure for Detecting Variants in Analysis of next Generation Sequencing
Because of the decreasing cost and high digital resolution, nextgeneration sequencing (NGS) is expected to replace the traditional hybridization-based microarray technology. For genetics study, the first-step analysis of NGS data is often to identify genomic variants among sequenced samples. Several statistical models and tests have been developed for variant calling in NGS study. The existing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017